23 research outputs found

    Streaming histogram sketching for rapid microbiome analytics

    Get PDF
    Background: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. Results: We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a ‘real life’ example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. Conclusions: Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space

    A full-body transcriptome and proteome resource for the European common carp.

    Get PDF
    BACKGROUND: The common carp (Cyprinus carpio) is the oldest, most domesticated and one of the most cultured fish species for food consumption. Besides its economic importance, the common carp is also highly suitable for comparative physiological and disease studies in combination with the animal model zebrafish (Danio rerio). They are genetically closely related but offer complementary benefits for fundamental research, with the large body mass of common carp presenting possibilities for obtaining sufficient cell material for advanced transcriptome and proteome studies. RESULTS: Here we have used 19 different tissues from an F1 hybrid strain of the common carp to perform transcriptome analyses using RNA-Seq. For a subset of the tissues we also have performed deep proteomic studies. As a reference, we updated the European common carp genome assembly using low coverage Pacific Biosciences sequencing to permit high-quality gene annotation. These annotated gene lists were linked to zebrafish homologs, enabling direct comparisons with published datasets. Using clustering, we have identified sets of genes that are potential selective markers for various types of tissues. In addition, we provide a script for a schematic anatomical viewer for visualizing organ-specific expression data. CONCLUSIONS: The identified transcriptome and proteome data for carp tissues represent a useful resource for further translational studies of tissue-specific markers for this economically important fish species that can lead to new markers for organ development. The similarity to zebrafish expression patterns confirms the value of common carp as a resource for studying tissue-specific expression in cyprinid fish. The availability of the annotated gene set of common carp will enable further research with both applied and fundamental purposes

    Sensitive detection of mitochondrial DNA variants for analysis of mitochondrial DNA-enriched extracts from frozen tumor tissue

    Get PDF
    Abstract Large variation exists in mitochondrial DNA (mtDNA) not only between but also within individuals. Also in human cancer, tumor-specific mtDNA variation exists. In this work, we describe the comparison of four methods to extract mtDNA as pure as possible from frozen tumor tissue. Also, three state-of-the-art methods for sensitive detection of mtDNA variants were evaluated. The main aim was to develop a procedure to detect low-frequent single-nucleotide mtDNA-specific variants in frozen tumor tissue. We show that of the methods evaluated, DNA extracted from cytosol fractions following exonuclease treatment results in highest mtDNA yield and purity from frozen tumor tissue (270-fold mtDNA enrichment). Next, we demonstrate the sensitivity of detection of low-frequent single-nucleotide mtDNA variants (≤1% allele frequency) in breast cancer cell lines MDA-MB-231 and MCF-7 by single-molecule real-time (SMRT) sequencing, UltraSEEK chemistry based mass spectrometry, and digital PCR. We also show de novo detection and allelic phasing of variants by SMRT sequencing. We conclude that our sensitive procedure to detect low-frequent single-nucleotide mtDNA variants from frozen tumor tissue is based on extraction of DNA from cytosol fractions followed by exonuclease treatment to obtain high mtDNA purity, and subsequent SMRT sequencing for (de novo) detection and allelic phasing of variants
    corecore